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ENCODER WITH ADAPTIVE RATE CONTROL 

Rate control is necessary in a JVT video encoder to achieve particular 
constant bitrates, when needed for fixed channel bandwidth applications with limited 
5 buffer sizes. Avoid buffer overflow and underflow is more challenging on video 
content that contains sections with different complexity characteristics, for example 
with scene changes and dissolves. 

Rate control has been studied for previous video compression standards. 
TMN8 [1] was proposed for H;263. The TMN8 rate control uses a frame-layer rate 
10 control to select the target number of bits for the current frame and a macroblock- 
layer rate control to select the value of QP for the macroblocks [4], 

In the frame-layer rate control, the target number of bits for the current frame 
is determined by 

fl = /?/F-A, (1) 

f W/F, W>Z»M 
A = i (2) 
[W -Z • M , otherwise 

W = max(W prev + B'-R I F,0) (3) 

15 where B is the target number of bits for a frame, R is the channel rate in bits 

per second, Fis the frame rate in frames per second, W is the number of bits in the 
encoder buffer, M is the maximum buffer size, W prev is the previous number of bits in 

the buffer, B'is the actual number of bits used of encoding the previous frame, and 
Z=0. 1 is set by default to achieve the low delay. 
20 The macroblock-layer rate control selects the value of the quantization step 

size for all the macroblocks in a frame, so that the sum of the macroblock bits is 
close to the frame target B. The optimal quantization step size Q-ior macroblock / in 
a frame can be determined by 



a '-jj^6 a, <4 » 
25 where Kis the model parameter, A is the number of pixels in a macroblock, N i 

is the number of macroblocks that remain to be encoded in the frame, o t is the 
standard deviation of the residue in the ith macroblock, a x is the distortion weight of 
the ith macroblock, C is the overhead rate, and /5, is the number of bits left for 
encoding the frame by setting /5, =#at the initialization stage. 
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The TMN8 scheme is simple and is known to be able to achieve both high 
quality and accurate bit rate, but is not well suited to H.264. Rate-distortion 
optimization (RDO) (i.e., rate-constrained motion estimation and mode decision) is a 
widely accepted approach in H.264 for mode decision and motion estimation, where 

5 the quantization parameter (QP) (used to decide X in the Lagrangian optimization) 
needs to be decided before RDO is performed [5]. But the TMN8 model requires the 
statistics of prediction error signal (residue) to estimate QP, which means that motion 
estimation and mode decision needs to be performed before QP is made, thus 
resulting in a chicken and egg dilemma. 

10 The methods disclosed in [2] and [3] have been proposed for H.264 rate 

control. Method [2] has been incorporated into the JVT JM reference software 
release JM7.4 [4]. To overcome the chicken and egg dilemma mentioned above, 
method [2] uses the residue of the collocated macroblock in the most recently coded 
picture with the same type to predict that of the current macroblock, and method [3] 

15 employs a two-step encoding, where the QP of the previous picture (QP prev ) is first 

used to generate the residue, and then the QP of current macroblock is estimated 
based on the residue. The former approach is simple, but it lacks precision. The 
latter approach is more accurate, but it requires multiple encoding, thus adding much 
complexity. 

20 In this invention, we build upon the model used in TMN8 of H.263+[4]. This 

model uses Lagrangian optimization to minimize distortion subject to the target 
bitrate constraint. To adapt the model into H.264 and to further improve the 
performance, we have to consider several issues. First, rate-distortion optimization 
(RDO) (i.e., rate-constrained motion estimation and mode decision) is a widely 

25 accepted approach in H.264 for mode decision and motion estimation, where the 

quantization parameter (QP) (used to decide A in the Lagrangian optimization) needs 
to be decided before RDO is performed [5]. But the TMN8 model requires the 
statistics of prediction error signal (residue) to estimate QP, which means that motion 
estimation and mode decision needs to be performed before QP is made, thus 

30 resulting in a chicken and egg dilemma. Second, TMN8 is targeted at low delay 
applications. But H.264 can be used for various applications. Therefore a new bit 
allocation and buffer management scheme is needed for various content. Third, 
TMN8 adapts the QP at macroblock level. Though a constraint is made on the QP 
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difference (DQUANT) between current macroblock and last coded macroblock, 
subjective effects of large QP variations within the same picture can be observed 
and has a negative subjective effect. In addition, it is known that using constant QP 
for the whole image may save additional bits for coding DQUANT, thus achieving 
5 higher PSNR for very low bit rate. Finally, H.264 uses 4x4 integer transform and if 
the codec uses some thresholding techniques such as in JM reference software [4], 
details may be lost. Therefore, it is useful to adopt the perceptual model in the rate 
control to maintain the details. 

10 Preprocessing Stage 

From equation (4), we can see that the TMN8 model requires the knowledge 
of standard deviation of the residue to estimate QP. However, RDO requires 
knowledge of the QP to perform motion estimation and mode decision thus to 
produce the residue. To overcome this dilemma, [2] uses the residue of the 
15 collocated macroblock in the most recently coded picture with the same type to 

predict that of the current macroblock, and [3] employs a two-step encoding, where 
the QP of the previous picture (QP prev ) is first used to generate the residue, and then 

the QP of current macroblock is estimated based on the residue. The former 
approach is simple, but it lacks precision. The latter approach is more accurate, but it 

20 requires multiple encoding, thus adding too much complexity. 

In our approach, we adopt a different method to estimate the residue, which is 
simpler than the method of [3], but more accurate than the method of [2]. 
Experiments show that a simple preprocessing stage can give a good estimation of 
the residue. For / picture, we only test the 3 most probable intra16x16 modes 

25 (vertical, horizontal and DC mode) and MSE (Mean Square Error) of the prediction 
residual is used to select the best mode. Only three modes are tested in order to 
reduce complexity. However, in other embodiments of the current invention more of 
fewer modes can be tested. 

30 The spatial residue is then generated using the best mode. It should be noted that 
we use the original pixel values for intra prediction instead of reconstructed ones, 
simply because the reconstructed pixels are not available. For P pictures, we 
perform a rate-constrained motion search using only the 16x16 block type and 1 
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reference picture. The temporal residue is generated using the best motion vector in 
this mode. The average QP of the previously coded picture is used to decide A on 
rate-constrained motion search. The experiment shows that by constraining the 
difference of QP between previous coded picture and current picture, the A based 
5 on QP prev has minor impact on motion estimation. The side advantage of this 

approach is that the resulted motion vectors in the preprocessing step can be used 
as initial motion vectors in the motion estimation during the encoding. 

Frame-layer rate control 

10 TMN8 is targeted to low-delay and low bit rate applications, which assume to 

encode only P pictures after the first / picture, hence the bit allocation model as 
shown in equation (1) should be re-defined to adapt to the various applications which 
use more frequent I pictures. The QP estimation model by equation (4) can result in 
large QP variation within one image, thus a frame-level QP is better first estimated to 

15 put a constraint on the variation of MB QP. In addition, for very low bit rate, due to 
the overhead of coding the DQUANT, it may be more efficient to use a constant 
picture QP. So a good rate control scheme should allow rate control at both the 
frame-level and the MB-level. 

We first propose a new bit allocation scheme. Then we shall present a simple 
20 scheme to decide a frame-level QP. 

In many applications, e.g. real-time encoders, the encoder does not know the 
total number of frames that need to be coded beforehand, or when scene changes 
will occur. Thus we adopted a GOP layer rate control to allocate target bits for each 
picture. The H.264 standard does not actually contain Group of Pictures, but the 
25 terminology is used here to represent the distance between I pictures. The length of 
the GOP is indicated by n cof . If Af co ,->oo, we set n gof = f , which corresponds to one 
second's length of frames. Notation bg !} is used to indicate the remaining bits in the 
GOP / after coding picture y-7, equaling to 

f min(/?G M +R/F*N COp ,R/F* N GQp + M *0.2), j = 0 
BG iJ = i BG. ._j - fl' j_ x , otherwise (5) 

30 In the above equation, rc^ is the number of remaining bits after GOP i-1 is 

coded, given by RG i _ l = R/F*N ndat -B ewitti1 where b^ is the used bits and A^is the 
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number of coded pictures after GOP 7 is finished. b kj and b u is the target bits and 

actual used bits for frame ; of GOP /, respectively. In equation (5), we add one 
constraint on the total number of bits allocated for the GOP / to prevent buffer 
overflow when the complexity level of the content varies dramatically from one GOP 
5 to another. For example, consider a scenario where the previous GOP was of very 
low complexity, e.g. all black, so the buffer fullness level would go quite low. Instead 
of allocating all of the unused bits from the previous GOP to the current GOP, the 
unused bits are distributed over several following GOPs by not allowing more than 
0.2M additional bits to an individual GOP. The target frame bit b u is then allocated 

10 according to picture type. If the jth picture is P, the target bits is a* = bg u /{k'n' +n p ), 
where k' is the bit ratio between / picture and P picture, which can be estimated 
using a sliding window approach, n 1 is the remaining number of / pictures in GOP / 
and n f is that of P pictures, otherwise, bI^k'b^ . Since P picture are used as the 
references by subsequent P pictures in the same GOP, we shall allocate more target 

15 bits for P pictures that are at the beginning of the GOP to ensure the later P pictures 
can be predicted from the references of better quality and the coding quality can be 
improved. We use a linear weighted P picture target bit allocation as follows: 

B'j + = R I F * 0.2 * (N C0P - 2j) /(N G0P - 2) (6) . 

Another constraint is added to better meet that target bits for a GOP as 

20 B itJ + = 0A*B diff , 

where b^^b^-b:^, and b^^ = si gn (B diir ^minos^ ^ i>/f>. 

In our rate control, we aim at 50% buffer occupancy. To prevent the buffer 
overflow or underflow, the target bits need to be jointly adapted with buffer level. The 
buffer level W is updated at the end of coding each picture by equation (3). In our 

25 approach, instead of using real buffer level to adjust the target bits, a virtual buffer 
level W given by w ' = max(w 7 o.4M) j s proposed. This helps prevent the scenario that if 
the previously coded pictures are of very low complexity such as black scenes and 
consume very few bits, then the buffer level will become very low. If we use the real 
buffer level to adjust target frame bits as in equation (7), we may allocate too many 

30 bits, which will cause QP to decrease very quickly., But after a while, when the scene 
returns to normal, the low QP will easily cause the buffer to overflow. Hence we need 
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to either increase QP dramatically or skip the frames. This causes the temporal 
quality to vary significant. Then we adjust the bits by buffer control as 

B u =B if *{2M -W )/(M +W ) (7) . 

To guarantee a minimum level of quality, we set b u =max(o.6*/?/F,fl. y ) . To further 

5 avoid the buffer overflow and underflow, we set buffer safety top margin W T and 
bottom margin W B for / picture as w; =o.75A/ , and w; =o.25M . As for P pictures, 
compliant with equation (5) and to allow enough buffer for the next / picture in the 
next GOP, we set 

wf =o- «o.4 -o.2) /(#-«* y+o.2))*Af , and w; =oam . The final target bits are determined as 
10 follows. We set =w+b u , w VB ^w^-r/f . If Wvr<W Tl b-=w vt ~w t , else if 

W VB <W Bt B+=W B -W VB . 

We note that if a scene change detector is employed, we shall encode the 
picture at the scene change to be an / picture and a new GOP starts from this / 
picture. The above scheme can still be employed. 

15 We propose a new scheme to decide frame-level QP based on equation (4). 

We modify (4) as 

where cis the overhead from last coded picture with the same type, cr is estimated 
in the preprocessing stage as in Section 3.1 . Two approaches can be used to get 
20 frame-level constant QP, denoted as qp s . The first approach is to set a , = <j, , so that 

all the MB QPs are equal. The second method is to use the same as that of the 
MB level, as defined in the next section, then use the mean, median or mode of the 
histogram of the e, values to find the qp s . 

In a preferred embodiment, the second method is used to better match the 
25 MB QP. The frame-level quantization step size is decided by the mean of the 

2, values, Q f =£<2 f /iv.We note that there is a conversion between the quantization 

parameter QP and quantization step size Q by q =2 (0 '- 6,/6 . To reduce the temporal 
quality variation between adjacent pictures, we set Qp f =max{QP/-D ft Tmn{QP f1 QP/+D,)) , 

where e/y is the frame QP of last coded frame, and d, = r W<01M . Since scene 

' I 4 otherwise 
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changes usually cause higher buffer levels, we take advantage of temporal masking 
effect and set Df to be a higher value when a scene change oc^WsT^ 

MB-layer rate control 

A first key feature in MB-layer rate control is about the adaptive selection of 
weighted distortion a, to get a better perceptual quality. A second key feature is to 
reduce the variation of the MB QPs in the same picture. 

For low detail content, such as an ocean wave, a lower QP is required to keep 
the details. But from an RDO point of view, a higher QP is preferred, because the 
lower detail content tends to give a higher PSNR. To keep a balance, we adopt 
different settings of a,for / and P pictures, respectively. For / picture, a higher 
distortion weight is given to the MBs with less detail, so that the detail can be better 
retained. Accordingly, we set 

N 

a, = (tr, + 2a avg ) /(2a. + a avg ), where a avg = £ a . / n. 

For P picture, a higher distortion weight is given to the MBs with more residue 
errors as in [4]. Accordingly, 

\2BIAN(\-o i )+a n B/AN<0.5 
[1, otherwise 

In this way, better perceptual quality is maintained for / picture and can be 
propagated to the following P pictures, while higher objective quality is still kept as in 
[4]. To prevent large variation of the quality inside one picture, we set 
Qp i =max(G/ > / -2,rmn{QP n QP f +2)) . If a frame level rate control is used, g/> = qp, . 

Virtual frame skipping 

After encoding one picture, we shall update Wby equation (3). If w>o.9M , the 
25 next frame is virtually skipped until the buffer level is below 0.9M. Virtual frame 

skipping is to code every MB in the P picture to be SKIP mode. In this way, we can 
syntactically keep a constant frame rate. If the current frame is decided to be a 
virtual skipped frame, we set qp, =qp, +2 . 

In summary, our rate control scheme consists of the following steps: 
30 preprocessing, frame target bits allocation and frame-level constant QP estimation, 
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MB-level QP estimation, buffer updates and virtual frame skipping control. Our 
approach can allow both frame-level and MB-level rate control. 

Advantages/Benefits From Various Embodiments 

The use of mean/median/mode of initial macroblock QP estimates to select 
frame level QP. 

When the selected frame level QP is used in the calculation of the individual 
macroblock QPs. 

When performing intra prediction using a subset of the allowable intra- 
prediction modes to form the residue that is used in the QP selection process. 

Use of a small number of intra-prediction modes (three (3), for example). 

When a previous GOP was coded with a large number of unused bits, limiting 
the additional bits allocated to the current GOP to a predetermined threshold. 

When a virtual buffer level instead of real buffer level is to used for buffer 
control. 

Figure 1 is a video encoder and is indicated generally by the reference 
numeral 100. An input to the encoder 100 is connected in signal communication with 
a non-inverting input of a summing junction 110. The output of the summing junction 
20 1 10 is connected in signal communication with a block transform function 120. The 
transformer 120 is connected in signal communication with a quantizer 130. The 
output of the quantizer 130 is connected in signal communication with a variable 
length coder ("VLC") 140, where the output of the VLC 140 is an externally available 
output of the encoder 1 00. 
25 The output of the quantizer 130 is further connected in signal communication 

with an inverse quantizer 150. The inverse quantizer 150 is connected in signal 
communication with an inverse block transformer 160, which, in turn, is connected in 
signal communication with a reference picture store 170. A first output of the 
reference picture store 170 is connected in signal communication with a first input of 
30 a motion estimator 180. The input to the encoder 100 is further connected in signal 
communication with a second input of the motion estimator 180. The output of the 
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motion estimator 180 is connected in signal communication with a first input of a 
motion compensator 190. A" second output of the reference picture store 170 is 
connected in signal communication with a second input of the motion compensator 
190. The output of the motion compensator 190 is connected in signal 
communication with an inverting input of the summing junction 110. 
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